Frequent-Itemset Mining Using Locality-Sensitive Hashing
نویسندگان
چکیده
The Apriori algorithm is a classical algorithm for the frequent itemset mining problem. A significant bottleneck in Apriori is the number of I/O operation involved, and the number of candidates it generates. We investigate the role of LSH techniques to overcome these problems, without adding much computational overhead. We propose randomized variations of Apriori that are based on asymmetric LSH defined over Hamming distance and Jaccard similarity.
منابع مشابه
Parallel Rule Mining with Dynamic Data Distribution under Heterogeneous Cluster Environment
Big data mining methods supports knowledge discovery on high scalable, high volume and high velocity data elements. The cloud computing environment provides computational and storage resources for the big data mining process. Hadoop is a widely used parallel and distributed computing platform for big data analysis and manages the homogeneous and heterogeneous computing models. The MapReduce fra...
متن کاملDetecting Frequent Patterns in Video Using Partly Locality Sensitive Hashing
Frequent patterns in video are useful clues to learn previously unknown events in an unsupervised way. This paper presents a novel method for detecting relatively long variable-length frequent patterns in video efficiently. The major contribution of the paper is that Partly Locality Sensitive Hashing (PLSH) is proposed as a sparse sampling method to detect frequent patterns faster than the conv...
متن کاملPrivacy Protection Using Sensitive Data Protection Algorithm In Frequent Itemset Mining Of Medical Datasets
Frequent Itemset Mining (FIM) is one of the most eminent techniques in the Data mining systems. The exploration of Frequent Itemset Mining distills the recurring knowledge from the incessant data. Explosion of Frequent Itemset Mining in the field of Data Analysis and Data Mining becomes an inescapable one. The paper focuses on “searching the accurate records of efficient database queries withou...
متن کاملPrivacy Preserving Frequent Itemset Mining by Reducing Sensitive Items Frequency using GA
Frequent Itemset mining extracts novel and useful knowledge from large repositories of data and this knowledge is useful for effective analysis and decision making in telecommunication networks, marketing, medical analysis, website linkages, financial transactions, advertising and other applications. The misuse of these techniques may lead to disclosure of sensitive information. Motivated by th...
متن کاملPrivacy-Preserving Frequent Itemset Mining for Sparse and Dense Data
Frequent itemset mining is a task that can in turn be used for other purposes such as associative rule mining. One problem is that the data may be sensitive, and its owner may refuse to give it for analysis in plaintext. There exist many privacy-preserving solutions for frequent itemset mining, but in any case enhancing the privacy inevitably spoils the efficiency. Leaking some less sensitive i...
متن کامل